-
Notifications
You must be signed in to change notification settings - Fork 10
Framework for expression-templates evaluation based on views #395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We may decide that offering overloads like auto xe = view_expression(x);
auto ye = view_expression(y);
auto expression = xe*exp(-ye) - 2*sin(ye);This would keep our overloads working on OUR types (plus scalars). It would also reduce the number of overloads that need to be implemented... Thoughts? |
af1399b to
e406d7d
Compare
e406d7d to
7ff4a3f
Compare
|
Notes: adding support for all ranks 0-8 was actually quite easy, so I just did it. Also, I was able to easily remove the hard-coding of the return type as Real. It required adding a small static fcn in each expression, specifying what the return type is, so that upstream expressions (like BinOp or math fcns) can infer their return type. |
We can deduce everything
7ff4a3f to
91f3975
Compare
Force user to build ViewExpression manually, and then use it in expressions
Overload max/min fcn, as well as unary minus
Instead of checking if they are arithmetic, check if they are an Expression type. This allows to expand later to non-arithmetic non-expression types (like Pack)
4d8d992 to
37d63f9
Compare
|
Updates:
These mods remarkably reduce the number of overloads we need to implement/maintain, at the small cost of a couple of if constexpr (hence, compiled away) in the eval method implementation. |
| class Expression { | ||
| public: | ||
|
|
||
| int num_indices () const { return cast().num_indices(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be turned into a static fcn, as the number of indices used by an expression is fully determined by its type.
So far, this fcn is only used in the evaluate call, to make sure that the rank of the view passed to the evaluation fcn is compatible with the expression to evaluate. I may revisit this pattern in a follow-up pr...
| template<typename ELeft, typename ERight> | ||
| class CmpExpression : public Expression<CmpExpression<ELeft,ERight>> { | ||
| public: | ||
| using ret_t = int; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used int for some reason, but it could probably be changed into bool. I already have a follow up branch where I extend the expressions framework to work with Packs, so it can be changed there.
Motivation
I toyed over the weekend with some code to enable expression templates in eamxx. Then I thought that it was maybe easier to work directly with views, so I moved to ekat. The framework should be lightweight in terms of performance, but I haven't tested it yet. I tested an earlier version in eamxx, which had more runtime overhead, and it was ~50% slower than a big manual loop. I am hoping that this ekat version (fully templated) will be faster and closer to parity with a manual loop.
The framework allows to create expressions from views and scalar, and then evaluate them (stuffing the result in a view). Something of the form
The first line creates an "expression", which is a compile-time evaluation tree. The second line performs a flat || for to evaluate it (uses RangePolicy in 1d, and MDRangePolicy otherwise).
So far, it only supports 1d,2d,and 3d views, and always evaluates the result as a double. If needed, I think we can easily expand to a generic type, at the cost of increasing the template signature of each class.
Testing
Added a unit test for 1d,2d,3d. Performs binary ops, simple math fcns, and ternary-operator-like evaluations.
Additional Information
I went back and forth between a couple of impl choices, namely whether eval should accept indices (like
eval(int i, int j)) or accept no inputs, but require to set the evaluation data in the expression at each iteration. I think the former is prob faster, so I went with that, at the cost of increasing the number of methods to implement.A few things I'd like to try:
z = x + reduction(y,1), which reduces y along 2nd dim, then adds to x.evaluateswitch betwen range and team policies, based on what expresison needs. This would require the expression to expose some "policy data", which could be something like "range is fine", or "needs team". This can prob be done at compile time, as each expression knows at compile time what kind of work it does, and the binary ops that compose expressions can ping inner expressions for this info.